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A technique for classifying objects is proposed that combines classification 
estimates based on the properties of object attributes. The technique was 
developed as an aid to classifying captured foreign documents on the bat¬ 
tlefield. In this approach, the input information consists of linguistic as¬ 
sessments of the document's classification; these assessments are based on 
document attributes such as document age, format, and place of discovery. 
The assessments are modeled as fuzzy sets and combined with the help of a 
decision function into an output fuzzy set that represents the overall assess¬ 
ment of the document. For a final linguistic classification, the output of the 
decision function is compared with target classes. The procedure achieves 
good performance if the decision function is trained on representative sets 
of classified objects. 

Although the classification procedure was developed for classification of 
captured documents, it might be also applied to target recognition from 
approximate sensor inputs, triage procedures and diagnostics in medical 
praxis, risk assessments, and similar problems where classification requires 
the combination of uncertain judgments. 
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1. Introduction 


Captured foreign documents can be an important source of vital informa¬ 
tion on the battlefield. The handling of such documents is regulated by 
Army Field Manual 34-52 [1] (pp 4-2 to 4-4), which stipulates that captured 
documents should be assigned to one of four categories (named A, B, C, 
and D), depending on the contents of the documents, and that the docu¬ 
ments should be dealt with according to their categories. Most important 
are documents of category A, which require immediate action; documents 
of categories B and C are less important, and documents of category D can 
be discarded. 

A problem with this classification can arise when the documents are not in 
English, because then the contents of captured documents might not be ob¬ 
vious. Some help is provided by the FALCON system [2], which scans the 
documents and provides a quick translation and a simple computer anal¬ 
ysis of the translated text (mainly by keyword searching). The soldier can 
use this analysis to assess the importance of the documents and eventually 
to categorize the documents. 

Two improvements to the system are being considered. First, it is proposed 
to analyze the original text instead of its English translation [3]. Second, 
the input information for classification decisions is being extended beyond 
text analysis to other document attributes, such as document date, circum¬ 
stances of capture, document type, etc. Under ideal circumstances, when 
the documents are in English, such additional attributes need not be con¬ 
sidered, because an understanding of the text outweighs all other informa¬ 
tion. However, additional document attributes can be important when the 
contents of the documents are available only through a cursory computer 
analysis. Therefore, for an automatic document classification, particularly 
when the source language is not English, the input from nontextual at¬ 
tributes should be fused with outputs from computer text analysis. This re¬ 
port addresses the incorporation of nontextual document assessments into 
the classification procedure. 

In the approach presented here, the importance of each document is ex¬ 
pressed by a numerical significance value that is based on the combined 
significance indicators obtained from the attributes of the document. The 
attributes can be either outputs from text analysis or the other document 
properties mentioned above, and the significance indicator values are com¬ 
bined by a decision function that computes an accumulated significance 
value. The significance indicators can be vague linguistic expressions (such 
as "medium importance," "low importance," etc). These expressions, as 
well as outputs from text analysis, are modeled by fuzzy sets on a signifi¬ 
cance scale from zero to unity. The decision function accumulates these 
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fuzzy sets by calculating an output fuzzy set that represents the signifi¬ 
cance of the document. Finally, the accumulated significance is compared 
with standard significance categories (for instance, the Army's A, B, C, and 
D categories) and the result interpreted and formulated in linguistic form, 
such as "the document is approximately category A," "the document is 
likely secret," etc. 

Section 2 describes common document attributes and corresponding sig¬ 
nificance indicators. Section 3 treats the computation and training of the 
decision function, and section 4 provides examples of document classifica¬ 
tion by attributes. A summary and conclusions are given in section 5. 



2. Attribute Evaluation 


2.1 Document Attributes 

Table 1 gives a tentative list of attributes that might contribute to docu¬ 
ment classification and can be assessed without an understanding of doc¬ 
ument content. (Eventually, actual documents will be used to establish a 
more comprehensive list.) The assumption is that to obtain the significance 
level of a document, one would inspect the document and estimate values 
of significance indicators for all available attributes. Each "significance in¬ 
dicator" is a linguistic assessment of document significance derived from 
properties of an attribute; the value of the indicator is assessed for individ¬ 
ual attributes independently of the properties of other attributes. For each at¬ 
tribute, the table lists some properties that might be used in estimating the 
value of the significance indicator. For example, the value of the indicator 
for the attribute "age of document" might be estimated as "medium high" 
if the document is current, and "low" if the document is several months 
old. Note that any attribute can have an entry equivalent to "unknown," 
which is treated as "not significant." Such an entry does not affect the ac¬ 
cumulated document significance value. 


Table 1. Document attributes. 


Attribute 

Indicator 

Attribute 

Indicator 

Age of document 

Current (present date) 
Recent (few days old) 
Weeks 

Months 

Unknown 

Style of document 

Typed 

Handwritten 

Printed 

Fax 

Carbon, photographic, or similar copy 

Format of document 

Military order 

Circumstances of 

Troop quarters 


Report 

Letter 

Indistinct 

discovery 

Abandoned house 

Office 

File cabinet 

Stationery 

Military stationery 
Business stationery 


Open field 

On a person 


Loose leaf 

Notebook 

Text analysis* 

Frequency of military keywords 

Military acronyms 

Standard military expressions 

Frequency of a particular keyword class 

Text arrangement in military standard form 


“There might be several indicators for this group, depending on the sophistication of the text analysis program. 
The list is a short tentative selection of possible subgroups. 
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2.2 Linguistically Scaled Significance Categories 

Values of the significance indicators are assessed in linguistic terms. For 
this I use a five-category scale of fuzzy numbers from Chen and Hwang 
[4] (p 468) with the categories "low," "medium low," "medium," "medium 
high," and "high." The membership functions of these numbers are shown 
in figure 1. For present purposes, the five-category scale is supplemented 
with two extreme categories, which correspond to "unknown significance" 
and "extremely significant." The former is represented by a crisp singleton 
at s = 0 (zero significance), and the latter is a crisp singleton at s = 1. The 
result is thus a scale of seven categories, a range that knowledge engineers 
consider optimal for linguistic estimates [4]. 


Figure 1. Membership 
functions of linguistic 
significance categories. 
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3. Significance Accumulation 


3.1 Decision Function 


Let di be the value of the significance indicator from attribute i. The in¬ 
dicators are crisp or fuzzy numbers between zero and unity. An overall 
significance level for the document is obtained by the accumulation of the 
di with the help of a decision function. Let n be the number of attributes of 
a given document. Then the accumulated significance S of that document 
is computed by 


5 = 1 


0.5[l+exp(0.1(l—n))] 


JJ(1 - (m,i-di)) Wi 


i =1 


(1) 


This decision function formula contains two sets of crisp parameters: at¬ 
tribute modifiers rrij and attribute weights These parameters make the 
decision function adaptable to particular applications. Their values are de¬ 
termined by a training procedure performed on representative sets of clas¬ 
sified documents (see sect. 3.4). 

The modifiers m, are positive and enter the formula for 5 as crisp multi¬ 
pliers of the fuzzy indicators a,-. (For numerical reasons, the lower bound 
of the multipliers is set equal to 0.001.) The purpose of the modifiers is to 
change the fuzzy significance indicators by increasing or decreasing their 
values. Because the significance measure s is restricted to the interval [0,1], 
a special truncating multiplication is used (instead of an ordinary multi¬ 
plication) and indicated in equation (1) by (• di). That multiplication 
truncates the product to the unit interval and assigns to the abscissa s = 1 
a membership value that equals the maximum membership of those parts 
of the product that have an abscissa larger than one. For instance, if the 
multiplication by a large factor m* shifts the support of di completely out 
of the unit interval, then the product (m* • di) is a crisp singleton at the 
significance level s = 1 (linguistically, "extremely significant"). 

The effects of the modifying parameters on significance membership func¬ 
tions are illustrated by figure 2. The original membership ("medium signifi¬ 
cance") is labeled with m = 1. If the modifying parameter is less than unity, 
the membership function is shifted to the left, and its spread is reduced. For 
modifying parameters larger than unity, the membership function is shifted 
to the right, its spread increases, and eventually (for large m) it is shifted 
out of the unit interval. In that case, the product equals a singleton at s - 1. 

The weights Wi are restricted to values larger than or equal to 0.1 and en¬ 
ter the formula as exponents of the contributions from the attributes i. A 
weight Wi has in essence the same effect as a times repetition of the ith 
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Figure 2. Membership 
function changes when 
parameters are 
modified. 


;;/ = 0.5 1.0 1.5 1.8 



estimate a/. Figure 3 shows the effect of weight parameters on a decision 
function, for simplicity assuming only one attribute and a modifying pa¬ 
rameter ??)] = 1. If the weight parameter w'i = 1, the membership func¬ 
tion of S equals the input membership, labeled with w - 1. If the weight 
parameter is less than unity, the input membership function is shifted to 
the left, and its spread is reduced. Hence, its effect is similar to that of a 
small modifying parameter. For this reason, to avoid redundant parame¬ 
ter adjustments, I set the minimum permissible value of uh to 0.1 (instead 
of zero). For > 1, the membership of the decision function is shifted 
to the right and its spread reduced. It never shifts out of the unit interval. 
This is different from the effects of the modifying parameters, and it adds 
flexibility to the decision function. 

The exponent of the square brackets in equation (1) has been determined 
experimentally. It reduces the contributions of individual significance indi¬ 
cators when the number n of attributes is large. The accumulated signifi¬ 
cance S of the document is a fuzzy set with support between zero and unity 
on the significance scale s. 

3.2 Target Categories 

The purpose of document classification is to assign each document to a cat- 
egory from a predefined set of target categories. In the approach presented 
here, the assignments are done by comparison of the accumulated signifi¬ 
cance 5 (a fuzzy set) with fuzzy sets that represent the target categories. 

Target categories are in this case the four captured document categories de¬ 
fined in the relevant Army Field Manual [1], These categories are modeled 
as fuzzy sets over the significance scale shown in figure 4. For consistency 
with the set of input categories, the set is supplemented with two extreme 
categories, D— and A+, which are represented as singletons at s — 0 and 
s = 1, respectively. This is an arbitrary representation of the Army's cat¬ 
egories, because the latter are defined in terms of the textual contents of 
the documents and not by their significance level. However, because the 
analysis presented here is based on fuzzy-set representation and the deci¬ 
sion function is determined by a training procedure, a fuzzy-set represen¬ 
tation of the target categories is adequate. It suffices that the membership 
functions of the four categories are in ascending order; the details of the 
functions are less important. 
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Figure 3. Effects of 

weight parameters on 1 

decision function. .9- 0.8 

s 0.6 
J8 0.4 

I 0.2 
^ 0 


Figure 4. Membership 
functions of captured 1 

document categories. .9- 0.8 

I 0.6 

E 0.4 

I 0.2 
^ 0 


3.3 Proximity Indicators 

To train the decision function, one needs a measure of the goodness of doc¬ 
ument classification, that is, a measure of the deviation of a category im¬ 
plied by the decision function from the known category of a document. 
Because the accumulated significance and the target categories are fuzzy 
sets, the goodness measure must be a measure of disparities between fuzzy 
sets. For present purposes, the disparity between an accumulated signifi¬ 
cance S and a target category C is expressed by three proximity indicators, 
which are referred to as separation, discord, and exclusion. 

The separation is the difference between the defuzzified values of S and 
C, where the defuzzified values are computed by the center of gravity 
method. Let ps(s) and nc(s) be the membership functions of S and C, 
respectively, and Gs and Gcj be the corresponding defuzzified values. The 
defuzzified value Gs (the center of gravity of ps{s)) is computed by 

and Gc is computed correspondingly. The separation of S from a target 
category C is defined by 

Psc — Gs - Gc- (3) 

The separation Psc can have any value between -1 and 1, and its sign in¬ 
dicates whether S is mainly to the left or mainly to the right of C. 

The discord [5] between the two fuzzy sets S and C is defined by 

Dsc = 1 - max min (ps(s), pc(s)) ■ (4) 


>— D C B A A+ 
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The discord varies between zero and unity. It equals zero when the cores of 
the two fuzzy numbers intersect, and it equals unity when their supports 
do not intersect. The discord expresses the lack of intensity of coincidence 
between the two fuzzy sets. 

The exclusion is a measure for the lack of overlap of the membership func¬ 
tions C and S. The exclusion E ri s - : . of a target category C from 5 is defined 

by 


(5) 


Note that the exclusion is not symmetric. It equals zero when the member¬ 
ship function pc{$) of the target category C is contained entirely in the set 
Ps{s)- If equals unity when 5 and C do not intersect. Otherwise E C{S) is 
positive and less than or equal to unity. If fi-c(s) is a singleton at s = sc, 
then the exclusion is defined by 


Ec(S) — 1 ^ //<?(*<")• 


( 6 ) 


(In this case, the exclusion and the discord are identical.) 

For the training of the decision function, a disparity between two fuzzy sets 
S and C is defined by 


d s(C ) 


Dg C + E' C{S) 


0/2 


Psc 


(7) 


The disparity d S ( C ) is a crisp number between -1 and +1. 


3.4 Decision Function Training 

The relative importance of attributes for document classification is modeled 
by the values of the decision function parameters m ; and Wj. To determine 
the proper values of these parameters, one would classify a set of docu¬ 
ments with known significance levels and find such parameter values that 
the document set overall is correctly classified. Let the training set consist 
of k documents with significance categories C Jr j = 1, ... , k. Each docu¬ 
ment is also characterized by a set of significance indicators according to 
its attributes. The significance indicators are accumulated by the decision 
function (eq (1)), providing for each document a fuzzy document signifi¬ 
cance Sj. The goal of the training is to modify the decision function param¬ 
eters such that the fuzzy sets Sj are close to the corresponding target sets 
Cj. As a measure of agreement, I use the sum U of squares of disparities 
between the accumulated significances and the target categories, where the 
disparities are computed with equation (7): 


U{m\,ni2. ... ,m n .u’i.u'2. 



( 8 ) 
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In equation ( 8 ), d!f {C) is the disparity between the accumulated significance 

Sj and the target significance Cj for the document j. The training param¬ 
eters are the attribute modifiers m* and the weights wp, see equation ( 1 ). 
They are determined by a steepest descent algorithm on the objective func¬ 
tion U in the parameter space. The partial derivatives of U that are needed 
for the steepest descent algorithm are numerically approximated by differ¬ 
ence quotients. 


Generation of Training Sets 


In generating synthetic "documents" for use in training sets, one must 
model the attributes that characterize a given document set. The most use¬ 
ful attributes in this context are those that correlate highly with the classi¬ 
fication of the document. For instance, assume that for a certain document 
type, two attributes (e.g., date and keywords) classify the document cor¬ 
rectly most of the time. In this instance, the significance indicators from 
these two attributes should correlate with the document significance level. 
Therefore, training sets that represent that document type should be so con¬ 
structed that sample correlation coefficients between the significance in¬ 
dicators of the two attributes and the corresponding document categories 
have values close to unity. 

This section outlines the construction of training sets with prescribed cor¬ 
relations that can be used to test the training program. In real-life applica¬ 
tions, a training set consists of documents that are considered typical for 
the relevant scenario. 

Let the number of documents in the training set be k. Each "document" in 
the training set is represented by its target class Cj, j = 1, ... . k and a list 

( 7 ) 

of significance indicators d\: 


n. tS 3) n ij] n {j) a {j) 

■ a l ? a 2 ’ a 3 ? • * * > u 'n • 


For construction of the training sets, sample correlation coefficients pi, i — 
1 , ... , n are prescribed between af 1 and Cj. 

The construction of training sets begins with a set of k crisp data points 
(. Xj, yj ), j = 1. ... , k. The sample correlation coefficient of the k data pairs 
( Xj.yj ) is defined by 


7 k{x,y) 



1//2 k 

j -1 


y): 


(9) 


where x and y are average values of the Xj and y 3 , respectively. To obtain 
a set with a prescribed positive correlation 7 0 , one can proceed as follows. 
First, two different values x\ and x 2 are chosen at random, and correspond¬ 
ing y-values are set equal to the .r-values: t/\ = x\ and ij 2 — x 2 . Next, the 
remaining Xj for j = 3 , ... , k are chosen at random, and the corresponding 



Vj are calculated such that the correlation 


Figure 5. Data set with 
sample correlation 0.9. 




i( x - y) = 1 - (1 - hoi) • (i/A-) -sgnhj. 


( 10 ) 


The calculation is done by numerical search alternating above and below 
the line through the first two data points. If the prescribed y 0 is negative, 
then one can use the same algorithm by changing the initial y-values to 

y\ = x -2 and yo = .rj. 

For this algorithm to be used for document generation, some adjustments 
are necessary. First, the values of x and y are restricted to the unit interval, 
and corresponding restrictions for the random choice and search algorithm 
apply. Second, for the initial data, one can use without loss of generality 
,ti = 0.2 and X 2 = 0.8 instead of random values. Then only the subsequent 
Xj are chosen randomly from a uniform distribution over the unit interval. 
Finally, for generating randomized y Jr the algorithm for the computation 
of ijj was augmented as follows. Let yj be a value of y that produces the 
desired correlation coefficient -,j, and let 


y j = v\ + (y-2 - y\) ——— (ii) 

X-2 — Xl 

be the ordinate of the intersection point of the line x = Xj with the line 
through the first two data points. The value yj is randomly chosen from an 
interval bounded by 


2/baso — Lj + (fjj Yj) ■ ( j/k ) ( 12 ) 

and 

Vend mill ^ 1. max [O. ybase T 2 ( ijj 1/base )]}• (13) 

The interval shrinks as j increases, and it has zero length for the final data 
point with j = k, which ensures a correct final correlation coefficient for 
the set. Figure 5 shows a set with 100 data points, one attribute, and a 
sample correlation coefficient of 0.9, which was generated by the described 
algorithm. 
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A conversion of the crisp data into fuzzy categories is needed to gener¬ 
ate fuzzy significance estimates. For conversion (fuzzification) of the data, 
the y-axis was subdivided in segments corresponding to the linguistic cat¬ 
egories (fig. 1 ) and the x-axis in segments corresponding to the Army's 
categories (fig. 4); fuzzy values a\ ] ) and Cj were assigned according to the 
compartment in which each point ( xj.yj ) was located. 

3.6 Linguistic Interpretation 

The output of the decision function is a fuzzy set S on the significance 
scale. One obtains a linguistic interpretation of the result by comparing S 
with fuzzy sets that represent target categories. The comparison is done 
in terms of the disparity d S (c) between fuzzy sets, and the document is 
assigned to the category that is closest to the accumulated significance in 
terms of the disparity. If S is between two target categories, then a cor¬ 
responding hedge is added to the linguistic interpretation (for example, 
"with low confidence"). 

As an illustration of the classification procedure, consider a simple case 
with only three attributes of equal weights (uy = 1 ) and modifying parame¬ 
ter values mi — 1.5, m 2 = 0.4, and m 3 = 0.1. Let the corresponding signifi¬ 
cance indicators from the three attributes be "low medium," "medium," 
and "high." Figure 6 (a) shows the input fuzzy sets. After multiplication 
with the parameters m !( one obtains a modified set of three indicators, 
shown in figure 6 (b). Accumulation of these three indicators by equation 
(1) produces a fuzzy set S', shown in figure 6 (b) by a heavy solid line. Next, 
S is compared with target categories (see fig. 4); the comparison is shown 
in figure 6 (c). Obviously, S is between categories B and C, and somewhat 
closer to B. The linguistically formulated result in this example is as follows: 

Document category is approximately B: Likely SECRET information with low 
confidence. 


11 



Figure 6. Example of a 
decision process: 

(a) input sets (line styles 
represent different sets); 

(b) sets after 
multiplication (solid 
line is fuzzy set S); and 

(c) comparison of S 
(solid line) with target 
categories (dashed 
lines). 
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4. Example 


4.1 Training and Test Sets 

Consider an example of training that was performed on a training set of 100 
synthetic "documents," generated as described in section 3.5. The number 
of attributes is n = 4, so that the input from each document consists of five 
items: 

C'target, Q. 1 , it 2 > 0,3 1 0 , 4 . 

The prescribed sample correlation coefficients pi = p(Ctarget> ^i)/ i — 1,2,3,4, 
were as follows: 

pi = 0.9, p 2 = 0.9, p 3 = 0.1, p 4 = -0.9. 

Accordingly, the first two attributes of the documents provide mostly cor¬ 
rect information about document significance, while the information from 
attributes 3 and 4 is mostly false. We would therefore expect that, after the 
training, the first two attributes will be weighted more heavily than the last 
two. 

A test set, also consisting of 100 "documents," was established in the same 
manner, with the only difference being that a different seed number was 
used in the random number generation routine. 

4.2 Training Results 

The training of the decision function required 36 iterations. The training 
history is illustrated in figure 7. Figure 7(a) displays the value of the ob¬ 
jective function U over the number of iteration steps. It shows that a mini¬ 
mum of U was found after about 29 iterations; the iteration was continued 
because of conservative iteration end conditions. 

Figure 7(b) shows the development of the modifying parameters m*. By 
the end of the iteration, the parameters of the "good" attributes, mi and 
m 2 , had settled to values of about unity. The "bad" attributes (3 and 4 ) 
both converged to very small values, making the contributions from these 
attributes negligible. The value m 3 = 0.001 is the lower bound for the mod¬ 
ifying parameters. 

Figure 7(c) shows the development of the weight parameters w t . The 
weights of the "good" parameters converge to values close to 0.4. The 
weight of u >3 of attribute 3 has a similar value, but that attribute is already 
eliminated by its very small modifying parameter. The weight of attribute 
4 converges to the smallest permissible value, 0.1. Because the parameters 
rrii and w t have, for small values, similar effects on the decision function, an 
attribute can be eliminated by a small value of either parameter, and there 
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Figure 7. Training 
history: (a) value of U 
over iteration; 

(b) development of 
parameters m,; and 

(c) development of 
parameters w,. 



Iteration step 



is no improvement from reducing the corresponding parameter from the 
other parameter set. 

The final values of the parameters were as follows: 

7771 = 1.251 7772 = 0.813 7773 = 0.001 77)4 = 0.173 

77.-1 = 0.382 «>2 = 0.360 77-3 = 0.326 77-4 = 0.100 

In principle, attribute 4 with its large negative correlation could be used for 
classification if I inverted its significance indicator. However, the present 
setup of the decision function training does not allow such a usage: the 
model is based on the assumption that significance information from at¬ 
tributes is correct. If this is not the case for documents in the training set, 
then the corresponding attribute is suppressed.* 

The capability of the trained decision function to classify documents of the 
training set is illustrated in figure 8. This figure shows the distributions of 

* A modification of the decision function that allows it to handle negative correlations is 
presented elsewhere [6]. 
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Figure 8. Classification 80 
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classification errors by attribute significance indicators and, with the label 
“0," the classification errors by the decision function. For this display, a clas¬ 
sification error is expressed in terms of the difference between the correct 
document class and the implied class by attributes and decision function, 
respectively. Because there are six target categories, a classification error 
is an integer with values between -5 and +5. Accordingly, the error dis¬ 
tributions are shown in 11 bins for each prediction. The figure shows that 
attributes 1 and 2 classify about 40 percent of the documents correctly and 
the remaining documents mostly with an error of +1 or —1 category. The 
classification errors of attribute 3 (with correlation 0.1) are randomly dis¬ 
tributed, and the classification errors of attribute 4 (with correlation -0.9) 
have a bimodal distribution. The distribution labeled "0" shows the distri¬ 
bution of classification errors by the trained decision function. It correctly 
classifies over 60 percent of the documents, which indicates that combining 
attribute significance indicators does indeed improve classification quality. 

To test the success of the training, I prepared a test set of 100 documents 
with the same characteristics as the training set (i.e., with the same correla¬ 
tions between attribute significance indicators and document classes). The 
classification results for this set are shown in figure 9 in the same form as 
figure 8. The distributions of classification errors have a pattern similar to 
that in figure 8 and confirm the success of the training. 


Figure 9. Classification 
of a test set. 
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5. Conclusions 


A classification procedure has been devised in which attribute-provided 
classifications are combined with the help of a fuzzy decision function. The 
decision function contains parameters that are determined by the function 
being trained on representative training sets. The trained function classi¬ 
fies with fewer errors than classifications based on any single attribute. Be¬ 
cause of the simplicity of the decision function and the transparency of 
the roles of the parameters, the training also provides indications about 
the importance of each attribute. Unimportant attributes are indicated by 
small parameter values and can be disregarded in subsequent classifica¬ 
tions, thereby simplifying the classification process. 

To improve the performance of the described classification method, an in¬ 
vestigation of the effects of the decision function's structure would be help¬ 
ful. In particular, the function parameters m, and «>,■ might be supplemented 
or replaced by different parameters and the effects of such changes on clas¬ 
sification performance studied. 

The described classification procedure has potential applications well be¬ 
yond the immediate use described in this report. Possible applications in¬ 
clude document classification, target recognition, triage procedures, diag¬ 
nostics, risk assessment, and other classification problems where the result 
depends on attribute properties. 
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